Lift by_path + controls gate (DID^X residualization)#378
Conversation
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…note
CI reviewer flagged that the new by_path + controls combination
silently produces point-estimate divergence from R on multi-baseline
switcher panels (R re-runs per-baseline residualization on each
path's restricted subsample; we residualize once globally). The
parity test docstring documented the deviation but REGISTRY.md and
the runtime did not.
Fixes:
- Emit UserWarning in fit() when by_path + controls is used on a
panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py
inside the controls residualization block, after _compute_group_switch_metadata)
- Update the by_path docstring with an explicit "Deviation from R on
multi-baseline switcher panels" paragraph
- Update REGISTRY.md "Per-path covariate residualization (DID^X)"
paragraph to document the point-estimate deviation alongside the
existing SE deviation
- Update CHANGELOG entry to call out the multi-baseline deviation
- Update R-generator scenario 16 comment to correctly describe R's
per-path re-residualization (the prior comment misstated R's
behavior as "residualize once globally")
- Update parity test class docstring to be precise about R's
per-path call site (R/R/did_multiplegt_dyn.R lines 393-411)
- Add two regression tests:
* test_multi_baseline_panel_emits_r_deviation_warning — joiner +
leaver + always-treated + never-treated panel triggers the warning
* test_single_baseline_panel_does_not_emit_r_deviation_warning —
standard 3-path joiners-only fixture does NOT trigger the warning
The single-baseline R-parity scenario (multi_path_reversible_by_path_controls)
remains exact-match (rtol ~1e-11) because all switchers in the DGP
share D_{g,1}=0 and R's per-path control pool reduces to the global
control pool we use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…note
CI reviewer flagged that the new by_path + controls combination
silently produces point-estimate divergence from R on multi-baseline
switcher panels (R re-runs per-baseline residualization on each
path's restricted subsample; we residualize once globally). The
parity test docstring documented the deviation but REGISTRY.md and
the runtime did not.
Fixes:
- Emit UserWarning in fit() when by_path + controls is used on a
panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py
inside the controls residualization block, after _compute_group_switch_metadata)
- Update the by_path docstring with an explicit "Deviation from R on
multi-baseline switcher panels" paragraph
- Update REGISTRY.md "Per-path covariate residualization (DID^X)"
paragraph to document the point-estimate deviation alongside the
existing SE deviation
- Update CHANGELOG entry to call out the multi-baseline deviation
- Update R-generator scenario 16 comment to correctly describe R's
per-path re-residualization (the prior comment misstated R's
behavior as "residualize once globally")
- Update parity test class docstring to be precise about R's
per-path call site (R/R/did_multiplegt_dyn.R lines 393-411)
- Add two regression tests:
* test_multi_baseline_panel_emits_r_deviation_warning — joiner +
leaver + always-treated + never-treated panel triggers the warning
* test_single_baseline_panel_does_not_emit_r_deviation_warning —
standard 3-path joiners-only fixture does NOT trigger the warning
The single-baseline R-parity scenario (multi_path_reversible_by_path_controls)
remains exact-match (rtol ~1e-11) because all switchers in the DGP
share D_{g,1}=0 and R's per-path control pool reduces to the global
control pool we use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a3de936 to
028e7e1
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Older Unreleased by_path entry (PR #357 origin) listed `controls` in the gated-combos list, contradicting the new entry that says `by_path + controls` is supported. Same internal contradiction applied implicitly to the bootstrap, placebo, and sup-t bands extensions that landed in the same Unreleased block — those weren't in the gated list to begin with, but the entry didn't acknowledge they're now supported. Updated the by_path bullet to remove `controls` from the gated list and explicitly acknowledge `n_bootstrap > 0`, `placebo=True`, joint sup-t bands, and `controls` as supported (see dedicated entries elsewhere in [Unreleased]). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No material findings beyond the methodology issue above. Performance No material findings. Maintainability No separate findings. The main maintainability risk is the incorrect methodology contract being duplicated across code comments, docs, tests, and benchmark notes. Tech Debt No separate findings. I did not find a Security No findings. Documentation/Tests
Path to Approval
|
…egression
Reviewer flagged that the parity condition documented for by_path +
controls (single-baseline switcher panel) might not be sufficient,
hypothesizing that R's per-path subset could exclude pre-switch
rows of other-path switchers and produce a different first-stage
residualization sample.
Verified the hypothesis is empirically falsified and analytically
incorrect by reading R/R/did_multiplegt_dyn.R lines 401-405 line-by-
line. R's per-path subset for path B includes:
- Rows where path_XX == B (path-B switchers, all rows)
- OR rows where yet_to_switch=1 AND baseline matches (pre-switch
rows of any group with matching baseline, regardless of path)
So R's per-path first-stage sample equals (pre-switch rows of all
switchers with matching baseline + all rows of never-switchers with
matching baseline) — bit-identical to our global first-stage sample
under single-baseline switcher panels, regardless of how F_g varies
across paths or within a path. Empirical confirmation: scenario 16
(`multi_path_reversible_by_path_controls`) has switcher F_g spanning
[0..6] across 4 distinct paths under D_{g,1}=0 and Python matches
R to rtol ~1e-11 across all (path, horizon) cells.
Strengthened the contract:
- Expanded the warning code comment to spell out R's per-path subset
construction (citing R source line numbers) and why single-baseline-
switcher is the precise parity condition (control-pool equivalence
via the OR clause), with the empirical scenario reference baked in
- Updated REGISTRY.md "Per-path covariate residualization (DID^X)"
paragraph to cite R lines 401-405 and clarify never-switcher
baselines do not affect parity
- New regression test
`test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r`
uses the golden-value scenario (single-baseline, heterogeneous F_g
across paths) to assert: (a) no UserWarning fires, (b) per-path
point estimates are produced finite. The numeric R-parity is locked
separately in TestDCDHDynRParityByPathControls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Previous P1 resolved: the updated sample-equivalence argument for heterogeneous Code Quality No material findings. Performance No material findings. Maintainability
Tech Debt No separate findings. I did not see a Security No findings. Documentation/Tests
|
…scope P3 #1 (Methodology): qualified the "exact R match" claim across docstring / REGISTRY / CHANGELOG / R-generator comment / parity test docstring with a cross-reference to the existing DID^X cell-weighting deviation (Python's first-stage uses equal cell weights, R weights by N_gt). The two coincide on one-observation-per-(g,t) panels (the common cell-aggregated regime that the parity scenario uses). The multi-observation-per-cell deviation is independent of the by_path lift and was already documented in REGISTRY's "Note (Phase 3 DID^X covariate adjustment)". P3 #2 (Maintainability): narrowed the Step 7b header comment in chaisemartin_dhaultfoeuille.py:1465-1473 to spell out that DID^X residualization applies to the per-group multi-horizon path (event_study_effects, overall_att, joiners/leavers, by_path, placebos, sup-t bands) but intentionally excludes per_period_effects which stays on raw outcomes per the existing "Note (Phase 3 DID^X covariate adjustment)" contract. Documentation-only fix; no runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Test name `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r` oversold its scope — the body only checks warning suppression and finite effects. The numeric R-parity assertion (rtol ~1e-11 per-path) lives in TestDCDHDynRParityByPathControls which fits the same scenario and compares against the golden values. Renamed to `test_single_baseline_heterogeneous_F_g_does_not_warn` and updated docstring + inline comment to spell out: this test locks the warning-suppression invariant; the numeric R-parity check is in the parity test class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
by_pathfollow-up sequence: combineby_path=kwithcontrols=[...](DID^X residualization)diff_diff/chaisemartin_dhaultfoeuille.py:1498runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE / bootstrap SE / per-path placebos / per-path joint sup-t bands) auto-inherit the residualizedY_mat(Frisch-Waugh-Lovell)multi_path_reversible_by_path_controlsvalidates per-path point estimates against canonical Rdid_multiplegt_dyn(..., by_path=3, controls="X1")— exact match (rtol ~1e-11) on the single-baseline DGPMethodology notes
R does per-path re-residualization with a path-restricted subsample (path's switchers + same-baseline not-yet-treated controls). Our architecture residualizes once globally, then disaggregates. On the
multi_path_reversibleDGP all switchers share baselineD_{g,1}=0, so R's per-path control pool equals our global control pool — residualization coefficients coincide and per-path point estimates match exactly. Per-path SE inherits the documented cross-path cohort-sharing deviation from R (Phase 2 envelope, ~6.5% rtol on this scenario).Per-period effects remain unadjusted. The per-period DID path uses raw
Y_matper the existing controls + per-period contract documented atchaisemartin_dhaultfoeuille.py:1493-1496. Pinned bytest_per_period_effects_unadjusted_with_by_path_controls.Files changed
diff_diff/chaisemartin_dhaultfoeuille.py:988-992; updateby_pathdocstring (removecontrolsfrom incompatible list, add inheritance paragraph)tests/test_chaisemartin_dhaultfoeuille.pycontrolsparametrize entry fromTestByPathGates; addTestByPathControlsclass (12 tests)tests/test_chaisemartin_dhaultfoeuille_parity.pyTestDCDHDynRParityByPathControlsclassbenchmarks/R/generate_dcdh_dynr_test_values.Rmulti_path_reversible_by_path_controlsbenchmarks/data/dcdh_dynr_golden_values.jsonRscript benchmarks/R/generate_dcdh_dynr_test_values.Rdocs/methodology/REGISTRY.mdcontrolsfrom gated-combos list in the Phase-3 by_path Note; add inheritance sub-paragraphCHANGELOG.mdTestByPathControlscovers all four downstream surfaces explicitly (analytical / bootstrap / placebo / sup-t bands), pluscovariate_residualsround-trip,to_dataframe(level="by_path")cband columns under controls + bootstrap, multi-covariate, path-enumeration-unaffected-by-controls, and the per-period-unadjusted contract.Test plan
pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls -m "slow or not slow"— 12 passedpytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls— 1 passed (point estimates exact match, SE within Phase 2 envelope)pytest tests/test_chaisemartin_dhaultfoeuille.py -k "ByPath"— 88 passed (all existing by_path classes regression-clean)pytest tests/test_chaisemartin_dhaultfoeuille.py::TestCovariateAdjustment— 8 passed (existing controls regression-clean)pytest tests/test_chaisemartin_dhaultfoeuille_parity.py— 19 passedtests/test_chaisemartin_dhaultfoeuille.py + test_methodology_chaisemartin_dhaultfoeuille.py + test_chaisemartin_dhaultfoeuille_parity.py) — 231 passed🤖 Generated with Claude Code